{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "xYxEKPhTgRif"
   },
   "source": [
    "### You can also run the notebook in [COLAB](https://colab.research.google.com/github/deepmipt/DeepPavlov/blob/master/examples/gobot_tutorial.ipynb)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "-NIf_5W0gRkj"
   },
   "source": [
    "# Simple bot in DeepPavlov"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "GjTJTYIqgRk2"
   },
   "source": [
    "This tutorial describes how to build a simple trainable dialogue system with DeepPavlov framework. It shows one of the easiest ways to create a chatbot. All you need is just a dozen of dialogs from your domain with bot responses annotated for dialogue acts. The tutorial covers the following steps:\n",
    "\n",
    "0. [Data preparation](#0.-Data-Preparation)\n",
    "1. [Train bot](#1.-Train-bot)\n",
    "2. [Interact with bot](#2.-Interact-with-bot)\n",
    "\n",
    "\n",
    "An example of the final model served as a telegram bot is:\n",
    "\n",
    "![gobot_simple_example.png](img/gobot_simple_example.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 1000
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 44268,
     "status": "ok",
     "timestamp": 1568799332537,
     "user": {
      "displayName": "Mikhail Burtsev",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mD-UjGT1Q2KIGGrL9KU-xXovwU2v8j7wSsrT1Tj9Q=s64",
      "userId": "02998805542659340239"
     },
     "user_tz": -180
    },
    "id": "JeSE4a-SgRjo",
    "outputId": "bde9888c-654d-4e0f-dd4f-0dbc416aa283"
   },
   "outputs": [],
   "source": [
    "!pip install deeppavlov\n",
    "!python -m deeppavlov install gobot_dstc2_minimal"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "fbv3rMFngRlH"
   },
   "source": [
    "## 0. Data Preparation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "hTpb4EHbgRla"
   },
   "source": [
    "In this tutorial we will build and train a simple chatbot just from 10 dialogues. \n",
    "\n",
    "Reading data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "B5oak1V5gRlq"
   },
   "outputs": [],
   "source": [
    "from deeppavlov.dataset_readers.dstc2_reader import SimpleDSTC2DatasetReader\n",
    "\n",
    "\n",
    "class AssistantDatasetReader(SimpleDSTC2DatasetReader):\n",
    "    \n",
    "    url = \"http://files.deeppavlov.ai/datasets/tutor_assistant_data.tar.gz\"\n",
    "    \n",
    "    @staticmethod\n",
    "    def _data_fname(datatype):\n",
    "        assert datatype in ('val', 'trn', 'tst'), \"wrong datatype name\"\n",
    "        return f\"assistant-{datatype}.json\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 137
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 595,
     "status": "ok",
     "timestamp": 1568799767898,
     "user": {
      "displayName": "Mikhail Burtsev",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mD-UjGT1Q2KIGGrL9KU-xXovwU2v8j7wSsrT1Tj9Q=s64",
      "userId": "02998805542659340239"
     },
     "user_tz": -180
    },
    "id": "I-GPAAWjgRmj",
    "outputId": "05f22005-56c2-48c7-882a-13a9d039b146"
   },
   "outputs": [],
   "source": [
    "data = AssistantDatasetReader().read('assistant_data')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "0dcewhzTgRns"
   },
   "source": [
    "The training/validation/test data is stored in json files (`assistant-trn.json`, `assistant-val.json` and `assistant-tst.json`):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 50
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 2259,
     "status": "ok",
     "timestamp": 1568799772627,
     "user": {
      "displayName": "Mikhail Burtsev",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mD-UjGT1Q2KIGGrL9KU-xXovwU2v8j7wSsrT1Tj9Q=s64",
      "userId": "02998805542659340239"
     },
     "user_tz": -180
    },
    "id": "T_q_AMkCgRnO",
    "outputId": "b57a7fec-16aa-4261-ece7-965f7cdb0718"
   },
   "outputs": [],
   "source": [
    "!ls assistant_data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "5elPwVGH8sFb"
   },
   "source": [
    "Let's take a look at the training data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 1000
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 1937,
     "status": "ok",
     "timestamp": 1568801176503,
     "user": {
      "displayName": "Mikhail Burtsev",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mD-UjGT1Q2KIGGrL9KU-xXovwU2v8j7wSsrT1Tj9Q=s64",
      "userId": "02998805542659340239"
     },
     "user_tz": -180
    },
    "id": "DOlrNl_9gRn9",
    "outputId": "966e2c29-8460-471c-ac6c-6951dc8eb7d1"
   },
   "outputs": [],
   "source": [
    "!head -n 310 assistant_data/assistant-trn.json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "HbgaikNS9JY0"
   },
   "source": [
    "Create data iterator to organize data processing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 357
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 668,
     "status": "error",
     "timestamp": 1568803282114,
     "user": {
      "displayName": "Mikhail Burtsev",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mD-UjGT1Q2KIGGrL9KU-xXovwU2v8j7wSsrT1Tj9Q=s64",
      "userId": "02998805542659340239"
     },
     "user_tz": -180
    },
    "id": "9NYptoABgRol",
    "outputId": "08c1e1b4-46cc-4be9-c4ab-98bc074e54a6",
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from deeppavlov.dataset_iterators.dialog_iterator import DialogDatasetIterator\n",
    "\n",
    "iterator = DialogDatasetIterator(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "hYh26FBDgRpL"
   },
   "source": [
    "You can now iterate over batches of preprocessed dialogs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 318
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 762,
     "status": "ok",
     "timestamp": 1568799789488,
     "user": {
      "displayName": "Mikhail Burtsev",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mD-UjGT1Q2KIGGrL9KU-xXovwU2v8j7wSsrT1Tj9Q=s64",
      "userId": "02998805542659340239"
     },
     "user_tz": -180
    },
    "id": "oMLknr2mgRpk",
    "outputId": "27d39825-3eac-4adb-c7ee-c3d12e8584ef"
   },
   "outputs": [],
   "source": [
    "from pprint import pprint\n",
    "\n",
    "for dialog in iterator.gen_batches(batch_size=1, data_type='train'):\n",
    "    turns_x, turns_y = dialog\n",
    "    \n",
    "    print(\"User utterances:\\n----------------\\n\")\n",
    "    pprint(turns_x[0], indent=4)\n",
    "    print(\"\\nSystem responses:\\n-----------------\\n\")\n",
    "    pprint(turns_y[0], indent=4)\n",
    "    \n",
    "    break\n",
    "\n",
    "print(\"\\n-----------------\")    \n",
    "print(f\"{len(iterator.get_instances('train')[0])} dialog(s) in train.\")\n",
    "print(f\"{len(iterator.get_instances('valid')[0])} dialog(s) in valid.\")\n",
    "print(f\"{len(iterator.get_instances('test')[0])} dialog(s) in test.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "PbSQDMHfgRqo"
   },
   "source": [
    "## 1. Train bot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "TgXWO32vgRqy"
   },
   "source": [
    "A policy module of the bot decides what action should be taken in the current dialogue state.The policy in our bot is implemented as a recurrent neural network (recurrency over user utterances) followed by a dense layer with softmax function on top. The network classifies user input into one of predefined system actions. Examples of possible actions are to say hello, to ask what is the weather or to suggest to drink tea. \n",
    "\n",
    "&nbsp;\n",
    "![gobot_simple_policy.png](img/gobot_simple_policy.png)\n",
    "&nbsp;\n",
    "\n",
    "All actions available for the system should be listed in a `assistant-templates.txt` file. Each action should be associated with a string of the corresponding system response.\n",
    "\n",
    "&nbsp;\n",
    "![gobot_simple_templates.png](img/gobot_simple_templates.png)\n",
    "&nbsp;\n",
    "\n",
    "Templates should be in the format `<act>TAB<template>`, where `<act>` is a dialogue action and `<template>` is the corresponding response."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "c6Ra1TzW-WGU"
   },
   "source": [
    "List of actions for our bot:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 100
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 1948,
     "status": "ok",
     "timestamp": 1568799821700,
     "user": {
      "displayName": "Mikhail Burtsev",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mD-UjGT1Q2KIGGrL9KU-xXovwU2v8j7wSsrT1Tj9Q=s64",
      "userId": "02998805542659340239"
     },
     "user_tz": -180
    },
    "id": "lqg_cbfegRrJ",
    "outputId": "9da7386c-b783-41fa-8280-d1c22ecb6958"
   },
   "outputs": [],
   "source": [
    "!head -n 10 assistant_data/assistant-templates.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "0EB74TkogRse"
   },
   "source": [
    "In essense, the dialogue policy module solves classification task, where a set of classes is defined in `assistant-templates.txt`. So, to train the dialogue policy network you need action label for each system's turn in training dialogues. Our assistant dataset provides `\"act\"` dictionary key that contains action associated with current response. Here is an example of training data for the policy network."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 535
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 2172,
     "status": "ok",
     "timestamp": 1568799833831,
     "user": {
      "displayName": "Mikhail Burtsev",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mD-UjGT1Q2KIGGrL9KU-xXovwU2v8j7wSsrT1Tj9Q=s64",
      "userId": "02998805542659340239"
     },
     "user_tz": -180
    },
    "id": "o-Ny-LEYgRsq",
    "outputId": "87f7a2b9-0d62-4ba7-bd3c-14cb1f697208"
   },
   "outputs": [],
   "source": [
    "!head -n 31 assistant_data/assistant-trn.json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "o5P9piXwgRtl"
   },
   "source": [
    "For our bot we will use ML pipline for task-oriented conversational skill from DeepPavlov. We will train this skill with our dialogue data. \n",
    "\n",
    "Skills in DeepPavlov are defined by configuration files. So, we will use [minimal DSTC2 bot config](https://github.com/deepmipt/DeepPavlov/blob/master/deeppavlov/configs/go_bot/gobot_dstc2_minimal.json) ([more configs](https://github.com/deepmipt/DeepPavlov/blob/master/deeppavlov/configs/go_bot) are available) and change sections responsible for \n",
    "- embeddings,\n",
    "- response templates,\n",
    "- data and model load/save paths.\n",
    "\n",
    "Loading bot:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "X59MSkmBgRt3"
   },
   "outputs": [],
   "source": [
    "from deeppavlov import configs\n",
    "from deeppavlov.core.common.file import read_json\n",
    "\n",
    "gobot_config = read_json(configs.go_bot.gobot_dstc2_minimal)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "BVFgvwKFBKv0"
   },
   "source": [
    "Download pre-trained GLOVe embeddings:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 53
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 1691,
     "status": "ok",
     "timestamp": 1568800095199,
     "user": {
      "displayName": "Mikhail Burtsev",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mD-UjGT1Q2KIGGrL9KU-xXovwU2v8j7wSsrT1Tj9Q=s64",
      "userId": "02998805542659340239"
     },
     "user_tz": -180
    },
    "id": "XouQ1IBegRvR",
    "outputId": "8bbeac1b-72d8-45a6-87b6-673ee71f5cb4"
   },
   "outputs": [],
   "source": [
    "from deeppavlov.download import download_resource\n",
    "\n",
    "download_resource(url=\"http://files.deeppavlov.ai/embeddings/glove.6B.100d.txt\",\n",
    "                  dest_paths=['assistant_bot/'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "KFhdvuWUBz5T"
   },
   "source": [
    "Configure bot to use downloaded embeddings:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "FElG1xfjgRvq"
   },
   "outputs": [],
   "source": [
    "gobot_config['chainer']['pipe'][-1]['embedder'] = {\n",
    "    \"class_name\": \"glove\",\n",
    "    \"load_path\": \"assistant_bot/glove.6B.100d.txt\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "KU0o9uM5gRui"
   },
   "source": [
    "Configure bot to use templates:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "yAACg0IAgRuq"
   },
   "outputs": [],
   "source": [
    "gobot_config['chainer']['pipe'][-1]['nlg_manager']['template_path'] = 'assistant_data/assistant-templates.txt'\n",
    "gobot_config['chainer']['pipe'][-1]['nlg_manager']['api_call_action'] = None"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "JV27LFatgRwE"
   },
   "source": [
    "Specify train/valid/test data path and path to save the final bot model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "pqhscSbhgRwK"
   },
   "outputs": [],
   "source": [
    "gobot_config['dataset_reader']['class_name'] = '__main__:AssistantDatasetReader'\n",
    "gobot_config['metadata']['variables']['DATA_PATH'] = 'assistant_data'\n",
    "\n",
    "gobot_config['metadata']['variables']['MODEL_PATH'] = 'assistant_bot'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "44c_Y8HsgR4l"
   },
   "source": [
    "The whole dialogue system pipeline looks like this:\n",
    "    \n",
    "![gobot_simple_pipeline.png](img/gobot_simple_pipeline.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "kirToS-DgR4v"
   },
   "source": [
    "Train policy network:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 792
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 40132,
     "status": "error",
     "timestamp": 1568800144573,
     "user": {
      "displayName": "Mikhail Burtsev",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mD-UjGT1Q2KIGGrL9KU-xXovwU2v8j7wSsrT1Tj9Q=s64",
      "userId": "02998805542659340239"
     },
     "user_tz": -180
    },
    "id": "ZRcmJBcvgR5D",
    "outputId": "d1ad0b56-d45a-4ae6-c7dc-cc88931dfed6"
   },
   "outputs": [],
   "source": [
    "from deeppavlov import train_model\n",
    "\n",
    "gobot_config['train']['batch_size'] = 4 # set batch size\n",
    "gobot_config['train']['max_batches'] = 30 # maximum number of training batches\n",
    "gobot_config['train']['val_every_n_batches'] = 30 # evaluate on full 'valid' split every 30 epochs\n",
    "gobot_config['train']['log_every_n_batches'] = 5 # evaluate on full 'train' split every 5 batches\n",
    "\n",
    "train_model(gobot_config);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "BMYLonE_gR_Q"
   },
   "source": [
    "Training on the dataset takes up to 5 minutes depending on gpu/cpu. See [config doc page](http://docs.deeppavlov.ai/en/master/intro/configuration.html) for advanced configuration of the training process."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "8CNlZyfSgSAi"
   },
   "source": [
    "# 2. Interact with bot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "kaUTLCl_gSAm"
   },
   "outputs": [],
   "source": [
    "from deeppavlov import build_model\n",
    "\n",
    "bot = build_model(gobot_config)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "1UTvHL01gSA5"
   },
   "outputs": [],
   "source": [
    "bot([[{\"text\": \"good evening, bot\"}]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "YY_BdF-egSBT"
   },
   "outputs": [],
   "source": [
    "bot([[{\"text\": \"the weather is clooudy and gloooomy\"}]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "-Xf1gCdmgSBr"
   },
   "outputs": [],
   "source": [
    "bot([[{\"text\": \"nice idea, thanks!\"}]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "Ad_GDanAgSCi"
   },
   "outputs": [],
   "source": [
    "bot.reset()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "r5312Df1gSC0"
   },
   "outputs": [],
   "source": [
    "bot([[{\"text\": \"hi bot\"}]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "jXQw_11jgSDM"
   },
   "outputs": [],
   "source": [
    "bot([[{\"text\": \"looks ok, the sun is bright and yesterday's rain stopped already\"}]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "-SKJicTCgSDe"
   },
   "outputs": [],
   "source": [
    "bot([[{\"text\": \"i dont wanna\"}]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "wdg4wl0dgSD9"
   },
   "source": [
    "You can also train a more advanced goal-oriented bot following [gobot_extended_tutorial.ipynb](https://github.com/deepmipt/DeepPavlov/blob/master/examples/gobot_extended_tutorial.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "name": "gobot_tutorial_simple(MB).ipynb",
   "provenance": [],
   "version": "0.3.2"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "accelerator": "GPU",
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
